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The objectives of this thesis were to design a method 
for evaluation of the diagnostic potential of available 
imaicators of coronary heart disease (CHD) and to prescneed 
systematic, quantitative procedure for aiding in its diag- 
nosis. A sample space of patients was divided into two 
mutually exclusive groups, those with angiographic evidence 
oa CHD. and those wrth no CHD. Active duty or sretined 
military men between the ages of 30 and 67 years constituted 
the sample space. Tests and risk factors were available in 
the medical literature that a doctor could View as an indi- 
@2c0r Or contraindicator of CHD. A vector of these possible 
indicators was established and the diseased group was com- 
pared to the non-diseased group in an effort to evaluate 
the diagnostic potential of the indicators. This was don 
by discriminant analysis in conjunction with a Bayesian 
method of weighting the importance of test results. The 
important indicators were then used to formulate a model for 


diagnosing CHD based on a Bayes' decision technique. 
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I. INTRODUCTION 


Heart attacks resulting from coronary heart disease 
(CHD) cause more deaths each year than cancer, strokes, and 
accidents combined. These deaths also include a broader 
spectrum of the population than in previous years. In the 
last century, heart disease was viewed as a natural result 
of growing old. But with the transition from a rural to 
an urban society, and the inherent traits of tension, rich 
diet, and lack of exercise, the propensity for heart disease 
has increased. This increase can be seen in the steady 
rise in the number of heart attacks among men over the past 
20 years. The American Heart Association reported that of 
mae 675,000 deaths from CHD expected during the past year, 
176,000 would have been men and women under the age of 65 
Get. 16). 

Medical capabilities have greatly increased, giving 
coronary heart disease patients a greater probability of 
survival once icy. are under medical Gare aout samee over 
half of those who die never reach a hospital, the problem 
of predicting coronary heart disease becomes very important. 
This diagnostic problem gains additional importance because 
of the lack of a proven method for the treatment of CHD in 
its advanced stages. Furthermore, there is an increased 
presence of asymptomatic CHD that may go undetected with 
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In this study an attempt has been made to consolidate 
a spectrum of risk factors that can be incorporated into 
miacnosticyprocedumes fommGHl). Specifically, the objectives 
were to design a method for evaluation of the diagnostic 
potential of available indicators of CHD and to present a 
eotematic, quantitative procedure for aiding in @ts diagmoca>. 

A sample space of patients was divided into two mutually 
exclusive groups, those with angiographic evidence of CHD, 
anidyethose with no CHD. Active duty or rétired military men 
between the ages of 30 and 67 years constituted the sample 
space. There were certain tests and risk factors available 
iieene medical literature that a doctor could view as an 
indicator or contraindicator of the disease. Having 
Sotablished a vector of these possible indicators, the 
diseased group was compared to the nondiseased group in an 
omort to evaluate the diagnostic potential of the indicators, 
This was done by discriminant analysis in conjunction with 
a Bayesian method of weighting the importance of test 
results. The important indicators were then used to 
formulate a model for diagnosing CHD based on a Bayes' 


decision technique. 





II. BACKGROUND 


Probabilistic and computer aided designs to aid decision 
makers in medical diagnosis have been a promising area of 
research for some time, and an abundant literature on these 
eubjects exists [Refs. 8, lO]. They have had little impact 
on the practice of medicine, however, with several charac- 
teristic reasons being given. Among them may be mentioned 
insufficient data bases because of the poor quality, lack 
PaiitOrmity, OF Miaeecssdbillity of medical reeonas, in 
addition, there appears to be a lack of understanding and 
interface between the medical profession and those who 
would apply probabilistic procedures to aid the medical 
decision makers. 

Recent years have shown an increase in research efforts 
aimed at the prevention and diagnosis of CHD. At the 
present time, however, coronary arteriography appears to 
be the only completely definitive test for the disease 
wees. 4, 12]. Unfortunately, this is a costly surgical 
procedure that requires hospitalization and involves 
definite mortality and morbidity factors, depending on the 
age and health of the patient. Arteriography is currently 
Only available at large medical centers because of the 
equipment and expertise required. 

Some diagnostic models for CHD tend to consider only 


Symptomatic patients, usually those with typical angina. 





This omits many subjects who are asymptomatic, a portion of 
m@aich may be suffering from Silent heart disease. 

The medical literature cites commonly accepted indica- 
mors for CHD. Widelysused indicators cited are hastory of 
/machemsc episodes, age; total cholesterol, trielycersces. 
mesting EKG, smoking, and family history [Refs. 4, 12, 16}. 
Less commonly used indicators that are also cited are race, 
blood type, and blood pressure [Refs. 4, 9]. In addition, 
the exercise test has recently gained widespread acceptance 
nema good CH) indicator [Refs. 1, 6]. The wellativesinpon 
/Manee of this test in conjunction with other indicators 
has not yet been thoroughly investigated. 

It seems appropriate that a diagnostic model for 
predicting CHD should investigate the potential of an 
exhaustive list of indicators and tests for the disease. 
This diagnostic model should also reduce the subjectivity 
in the decision making of the doctor by increasing the 
amount of objective evidence through the appropriate indi- 


Gators and tests. 





Ill. DESGRREITIVE MODEL 


The flow of patients to a cardiac clinic 1S similar 
to the input of any other specialty clinic. A patient may 
me referred to the cardiologist by another doctor based 
on the results of a physical examination or, if a person 
believes that he is suffering from a cardiac or cardiac- 
related illness, he may voluntarily seek the advice of the 
maeerarrst directly. In either case, by the wramera parce 
meeadmaitted to the cardiologist's office, there is already | 
certain data on him that 1s available to the physician 
without specified testing. From that point on, however, 
the diagnosis of a possible heart disease is a function of 
the doctor's ability to assign relative importance to the 
appropriate indicators. Costs of associated testing, the 
procedures available, the patient, and the patient's health 
may also have a bearing on the doctor's ability to diagnose 
Berrectly. 

The cardiologist then may be viewed as a decision maker 
who, for each patient, receives an amount of initial infor- 
mation Lh from which he initiates a sequence of decisions, 
gaining additional information ry! as a result of testing. 


Figure 1 shows a schematic of these decision processes. 
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As an illustration of the concepts implied in Figure l, 
mamisider that a patient is referred to the cardiologist 
because he has symptoms of CHD. At decision node D, tert 
doctor evaluates the information he has available. Usually 
mans 1S information readily available in the patient's 
medical record. Based on this information, ENE Mce. t Ommeeuicrs 
mwo choices at Dy» diagnosis of the patient or requesting 
additional testing. If, for example, the doctor chooses t»: 
perform a test, decision node Dy represents the choice the 
doctor must make from the clinical tests available. Having 
Made the choice, r.' represents the information that results 
from the outcome of the test. The doctor is again faced 
With the decision to be made at Do» but he now has the new 
information Ls Witten Ke duce cmelcmelance Ol ed MeslnieG@ i 1 cm 
diagnosis. 

A summary is presented in Table 1 that shows the possible 


path of a patient through a diagnostic sequence. 





ee 


tt a 
ae £6: 
i 


Pe a , 
ee tee me 
@ Sid om, 


ABIES. 1 


PATIENT ADMITTED TO THE CARDIOLOGIST 


face, Sex, Age, Height, 
Weight, Bliood Pressure, 


mood Type, Family History AVAILABLE 

of Heart Disease, Smoking INFORMATION (A) 
escory, History of Ischemic (I) 

Episodes 


FURTHER TESTING SPECIFIED 51“ CAl@ Ore Gres l 


Resting EKG 


mcs? “ti CLINICAL 
meet ycerides LESES (B) 
Cholesterol 
(I. ') 
Angiogram ° 


This summary does not dictate a specified sequence of tests 
or weightings of relative importance. The information in 
@) is data available (facts about the patient) that are 
easily obtained without testing. Tests in (B) require 
expert judgment or’ clinical procedures and, again, are not 
ordered in any sequence of importance. In practice, not 
all of the listed indicators are used for decision making. 
some may be considered by a particular doctor to be unin- 
Mertant. It is also difficult to assign subjective proba- 
bilities to some of the indicators about which little is 
known. Furthermore, it ts impractical to correlate the 


contributions of a large number of indicators without some 


4 ape 0% 2b) eetive model. 
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IV. QUANTITATIVE METHODS 


In general, there are two approaches to medical decision 
meoblems. The first as»to develop and»sperfect a model that 
euedicts as well as or better than a physician. The second 
approach consists of improving ways to aggregate, weight, 
and use information available to the physician so that his 
personal diagnosis will be conducted from a substantially 
sounder base. This latter approach, which is commonly 
called "bootstrapping" [Ref. 10] was the one selected for 
this study. 

A set of CHD indicators was identified and evaluated 
experimentally using discriminant analysis. A proposed 
method of assigning weighting factors based on the "posterior 
odds"' of the various indicator levels was incorporated into 
the analysis. These results were then integrated into a 


Bayesian diagnostic model. 


A. INDICATORS AND WEIGHTING FACTORS 

At decision node Dy of Figure 1, the doctor must decide 
what test to use next in his evaluation of the patient. To 
do this he must have a knowledge of et indicators Of CHD 
have been evaluated and the amount of additional information, 
ane he can expect to obtain from these indicators. Compli- 
cating the doctor's evaluation is the division of the 


indicators into two types, qualitative and quantitative. 


The quantitative indicators are tests in which the outcome 


i 





is represented on an acceptable numerical scale. Of the 
/mecaicators used in this paper, only triglycerides, choles- 
Merol, age, and blood pressure were quantitative variables. 
The other indicators shown in Table 1 (except height and 
weight which were not used) have results which have no 
numerical scale and must be interpreted qualitatively. 

mer example, the indicator called history of ischemic 
episodes requires the patient to verbalize his history of 
Chest pain. Also included in the category of qualitative 
indicators are tests in which the result is numerical but 
lacks meaning unless expressed in qualitative terms. The 
exercise EKG result, for example, is in millimeters of 
@epression (or elevation) of the S-T segment, but 1s inter- 
preted in terms of being positive or negative. 

Ber pOinted out previously, these indicatorssend nem, 
Telative merit were determined from clinical judgment and 
varied among cardiologists. In addition, the relative impor- 
tance of various outcomes of any specific test also varied 
among doctors. To alleviate these problems, a two-step 
procedure was used. First, the outcomes of the qualitative 
tests were assigned weighting factors using Bayes' Theorem. 
second, the qualitative variables and quantitative variables 
were integrated into a relative ranking using a stepwise 
discriminant analysis computer routine [Ref. 11]. 

Consider a particular qualitative variable i for which 


P(t; 5 |D) is the conditional probability of outcome j 


AZ 





given a patient has CHD. The posterior probability of CHD 


‘ome. , in light of this information) is 


P(t; [D)P@) 


aa ea (1) 
P(t; ,|D)P(D) + P(t, ; |B)P CB) 


P(D[t; 5) = 


where P(D) is the presumably known prior probability of CHD. 
Each of these probabilities on the right hand side of 
equation (1) can be estimated from past data. The results 
are a vector of values for the outcomes of a specific test 
which could then be used with the outcomes of other tests 
in a stepwise discriminant analysis computer routine. How- 
ever, in order to give more meaning to the weighting fac- 
ors , Neat they were normalized using 

P(D|t; 5) 


We Be (2. 
J min {P(D|t.,)} 
k 


Where it was 47 bse decided to use the minimum outcome 
in order to show increasing likelihood of disease as the 
value of the weighting factor increased. 

Consider the following simple example to illustrate the 
procedure for computing weighting factors. Suppose Ie OL 
desirable to find weighting factors for the qualitative 
firable "race" (i = R) which for the purpose of illustration, 
has two outcomes: NEGRO (j = 1) and CAUCASIAN (j = 2). 


Emppesce further that the prior distribution of CHD is 
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P(D) = 0.1 and data reveals that P(tp,|D) = 0.2 and 

Elgtp, |D) = 0.4. It then follows from equations (1) and (2) 

that the weighting factors are Wp, = ee Uraria Wao = 20aOr 

This method of computing the weighting factors twijtiel, 

.,n3;j=l,...,m} provides a consistent means of assigning 

scores to each of the qualitative variables. This was done 

meme a particular set of indicators examined in this study 

and the results are given in Section VI. Stepwise linear 


erseriminant analysis [Ref. 11] could, at this point, be 


used to develop a linear prediction function L = 


A.X. 
j=, 1 1 
where X is the set of all test variables (quantitative and 
qualitative), A is the set of all coefficients assigned by 
the computer routine, and m is the number of tests. Maha- 
lanobis distance could then be used as the discrimination 
eeeeterion. 

ohn [Ref. 4]. used thas type of linear discriminant 
analysis in its predictive role in a medical decision con- 
fet. Use of discriminant analysis for prediction was 
discarded in this paper for two reasons. The technique is 
a valid one when the underlying distributions of the random 
variables of the two samples (in this case, the test results) 
are distributed normally with equal covariance matrices (a 
linearity assumption). A preliminary investigation indicated 


that the variance of the test results in the two samples did 


meceappear to be equal. Additionally, the normality 
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fesunptiLon did not appear to™be valid in this application. 
mre test results had a combination of binomial, multi- 
momial, and approximately normal distributions. Considera- 
tion of all distributions as normal did not have a sound 
theoretical basis. 

The actual purpose of conducting this portion of the 
analysis was to identify the relative importance among the 
variables. This was accomplished by ordering the resulting 
F-statistics associated with the coefficients (A's) of the 
meraables (X's). The F-statistic 1s the ratio of the vari- 
ability of the means of the individual test results in 
each sample to the pooled variance of the test results. F 
will be large when there is a large difference between the 
mean results of a test in the CHD and the no CHD groups. 
Likewise, the smaller the F, the closer together are the 
mean results for a particular test in the CHD and no CHD 
groups. Thus, an ordering of these computed F-statistics 
from largest to smallest may be considered an ordinal 


ranking of the diagnostic power of the various indicators. 


B, BAYESIAN DIAGNOSTIC MODEL 

The foregoing procedure, of Section IV.A., for determining 
the relative diagnostic power of the available tests of 
indicators provides criteria for the cardiologist to select 


appropriate tests at decision node D, in Figure l. A 


i 
Bayesian method for quantifying the information hi and 


additional information I,’ is HOW Ee SeEMN Leds 6) > eee 
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The development of this model was based on two major 
assumptions. First, it was assumed that patients being 
Bested either had CHD or did not have CHD. Thus, the case 
of a patient having multiple diseases was excluded here. 
The second assumption was that the data, on both qualita- 


tive and quantitative variables were conditionally inde- 


mendent. 
Let 
P(D,) = apriori probability of CHD (D,), or no CHD COE a1) < 
P(D, |S,,..-,S,) a posterior probability of D, given 
Pempcoms, or indicator levels, Syaeees 5s 


P(S),---,S,|D;) = conditional probability of symptoms 


Spores sD), given D;. 


The first assumption merely requires that P(D, )=P(D,) 


or P(D,) = PD. The second assumption, in terms of the 


move notation, says that 


| n 
(Se ee ee Ee emDey (3) 


j=l 
It then follows from Bayes' Theorem that 
P(D,) O Pitors ie 


J 1 
8.) = J 
Z P(S),--- 58, |D;)P (D5) 
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Mach is in terms that can be calculated using subjective 
probabilities (doctor's medical opinions) and frequentistic 
meocedures [Ref. 8]. 

iimiesmajority of the conditional probabilities were 
calculated using frequentistic procedures. Subjective 
probabilities were used when the data base was insufficient. 
In cases where a patient was missing the pee symptom on his 
medical records or the patient was unable to take the test, 
@fe conditional probabilities P(S;|D,) and P(S;|D,) were 
Bet equal to .5 {(i.e., P(S;|D,) and P(S;|D,) were equally 
/mekeiy and thus had no influence on the associated proba- 
Pietities). 

The Bayesian diagnostic model was developed because it 
meevided several distinct advantages over general discrimi- 
nant analysis techniques commonly used for medical decision 
making. The first advantage was the use of subjective 
apriori probabilities. Each doctor has his own feelings 
@nd experience concerning the probability of CHD in a patient. 
The second advantage was that the Bayesian model is self- 
updating. After each patient has been diagnosed, his charac- 
teristics can be easily added to the data providing new 
apriori probabilities. This allows the doctor to see trends 
that may develop, providing the stimulus for research in 
these areas. The data base is continuously enlarged in this 
manner, improving the diagnostic accuracy of the model. The 


third advantage is that CHD is only a small part of the 
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@@acnostic problem facing the dogtor. The Bayesian approach 
allows for the expansion of the hypothesis. In the present 
model only one hypothesis is treated, no CHD or CHD. How- 
mau, this could casily be expanded to no disease, CHD; 
Maver disease, etc. An important aspect of this is that as 
the number of data points in the data vector and the number 
of hypotheses are increased, the accuracy of the model 


improves. 
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V. SG@INICAL TESTS AND OBSERVATIONS 


Data was derived from three sources. The first source 
was generated by testing a sample of individuals undergoing 
routine physical examinations at Fort Ord Army Hospital. 

A collection sheet was developed to record the data that 
was simple yet comprehensive enough to see if trends 
developed in areas not considered important in the initial 
analysis (See Appendix A). 

The second source of data was the medical records at 
Letterman General Hospital, San Francisco. A data sheet 
miiiar to that of the Fort Ord sample was used: However, 
fewetral problem areas were encountered. The first was the 
problem of definition and interpretation. Many records 
Showed information such as "positive" family history with 
no explanation of what the doctor's opinion was based on. 
Others had entries such as '30 pack year history" of 
smoking. This type of data does not differentiate between 
two packs per day for 15 years or three packs per day for 
10 years. Since intensity of smoking may be an important 
variable, much valuable data were lost. Another problem in 
this area was the omission of data that were assumed to be 
normal. If a patient's test result was abnormal, the 
result was noted in the patient's record. (However, if 
nothing was noted, it was not clear whether the test result 


was normal or that the result was omitted.) It is clear 
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that personalities become an important factor in the writing 
mod in the reading of medical records. Howéver, it “is felt 
that as more records are automated these problems will be 
ereatly reduced. 

The problem of missing data was the major obstacle 
encountered from the CHD population. The majority of the 
maerents did not have all the test results in their files. 
Mie only solution to this problem is to increase the sample 
Size so that patients with missing data can be removed from 
the sample. But since one of the major objectives of this 
paper was to develop a method, the missing data problem will 
not be considered within this framework. For information 
Bemcerning decision making with missing data, see Ref. 4. 

The third source of data was the medical literature. 
This was used to establish apriori probabilities of CHD 
when it was felt that the experimental Sample was too small, 
making the sample probabilities very sensitive to error 
feet. 7]. 

The partitioning of the sample space into two parts, 

CHD and no CHD, implied that the subject in the healthy 
group was not suffering from any disease, and that a sub- 
ject in the CHD group was suffering from CHD only. Other 
diseases may have adversely affected the test results of 
Eeener group. In the formation of the sample, care was 
manen tO Climinate all subjects that had other diseases. 
In the determination of positive or negative family 


iiseOny. the ace Of 65 waS considered the cut-off. If a 
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blood relative had CHD prior to age 65, the result was 
Bositive. Although this cut-off was arbitrary, it was the 
one most consistent with the available literature. It 
can be easily changed, however, if another cut-off is 
desired. 

When checking for chest pain, the existence of any 
chest pain that was not categorized as angina was listed 
as undetermined origin since none of the subjects were 
known to have diseases which might explain the pain. 

The reading of the resting EKG was done by a cardiologist 
whose experience and subjective opinions must be considered 


an important part of the data. 
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WW. SENSE Rr: 


Sensitivity analysis was conducted in the following 
areas: 

i. the effect of welghting Bactors on) tiewencinia 
ranking of the qualitative indicators was investigated. 
Table 2 shows how changes in weighting factors proved to 
markedly influence the diagnostic ordering of the indivca- 


mers shown in Table 3. 


TABLE 2 
Sample Clinical 
Bayes! Judgment 
Test Weighting Factor Weighting Factor 
Blood Type 
A | Z 
Other 1 1 
Family History 
Positive | 2 
Negative | il il 
Poking History (per day) 
Non-smokers il 1 
Less than 1/2 pack 4.6 Z 
About 1 pack 4.5 3 
Greater than 1 pack 6.5 4 
History of Ischemic Episodes 
None it 1 
Chest pain 8 2 
Typical angina 5 5 


22 








TABLE 2 (Continued) 


Resting EKG 


Normal 1 i 
Other 4 2 
ST-T abnormalities 20 3 
Pathologic Q-waves tas 4 
Race 
Caucasian OS 1 
Negro 1 2 
Mongolian hes 3 


Exercise EKG 


Normal il 1 
ST depression < lmm Zook 2 
pl depression = mm 150 5 


All other indicators were quantitative. The following 
ordering of indicators and their associated F-statistics 
resulted (Table 3): 


TABLE 3 


Bayesian Weighting Procedure 


History of Ischemic Episodes 97 os 
Eeercise EKG 3 WAL 
Age 5 ZemS 
Resting EKG 2.6744 


Pas 





TADLE@ Se (Comeaiiuica) 


Blood Type Mec 
PholLest eno | Dus 
Density Za. 
Cigarette Smoking i 


mystolic Blood Prescsume 
Family History 


Pie lLyecerilaes 


0552 
ZOE 
0640 
ois 


Sse, 
. 1607 
0485 


Diastolic Blood Pressure and Race were omitted because 


an insignificant F value for this particular sample. 


Sample Clinical Weighting Procedure 


Peercise EKG Woe: 
History of Ischemic Episodes i 
Mensity — 2 
Blood Type Z 
Cholesterol! Te 
Systolic Blood Pressure Ze 
Resting EKG | iF 
Cigarette Smoking | 1 
Family History 1 
Diastolic Blood Pressure 

ieee yeer lacs 

Age 

Race 
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2. Diagnostic accuracy was investigated by varying the 
prior probability of disease, P(D), and assuming P(D|S,,... 
So > 0.5 indicated CHD. These values for Table 4 were 


determined from patients having eight or more test results. 


TABLE 4 
04 9/50 0752 
.10 6/50 0/52 
.20 5/50 0/52 
. 30 5/50 52 
.40 3/50 1/52 
90 2/50 1/52 


** False Negative = patient has CHD but is diagnosed as 
not having CHD. 


feralse Positive = patient does not have CHD bute is 
diagnosed as having CHD. 


3. After the model had been developed and the condi- 
Eional probabilities had been determined, data on CHD 
patients were obtained from Walter Reed Hospital. Using the 
originally determined probabilities, these patients were 
imeeted With the Bayes’ diagnostic model and 12 out of 14 
were correctly diagnosed as having CHD. Again, a P(D[S,,..-, 
Sy) > Up omindicated. Chin 

The Walter Reed patients were then added to the original 


sample to update the prior probability of disease. The 


ap 





changes in the prior probabilities were so small that they 
had no effect on the diagnostic results. 
4. Diagnostic accuracy was investigated by varying 
that probability above which CHD would be indicated 
fable 5): 
TABLE 5 
False Positives 


P(D Syorees5) False Negatives 


me 5/58 SZ 
a 4/58 0/52 
vo 8/58 Oy sZ 
4 9/58 B52 
-) 9/58 0752 
. 6 9/58 0/52 
ail 9/58 Oe Se 
0 11/58 0/52 


ao IB SG 0/52 
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VII. RESULTS AND CONCLUSIONS 


MS’ previous ly“stated™amesectaon 1, thetebyeeuimes of 
the study were to design a method for the evaluation of the 
adearenostic potential of available indicators of €HD and to 
present a systematic, quantitative procedure for aiding in 
its diagnosis. The indicators of CHD were investigated by 
Genmparing specific test results from a CHD sample wand a 
healthy sample with no CHD. 

The stepwise discriminant analysis, as presented in 
Section IV.A., using all variables was performed on a CHD 
Sampee size of 106 compared to a no CHD sample size of 56. 
The weighting factors were determined by the Bayesian 
approach (taoulated in Table 3, Section VI). An important 
result of the-discriminant analysis program was the ordering 
of variables and their associated F-statistics which may 
be viewed as an ordering of the relative diagnostic impor- 
tance of the tests-(see Table 4, Section VI). This method 
of assigning weighting factors to test results in conjunc- 
tion with discriminant analysis is a valid procedure for 
Sraching the vector of tests in their diagnostic importance. 
It provides a means for a doctor at decision node D, (ent 
Figure 1) to determine which test provides the most additional 
information i from those available to him. Additionally, 
the method is particularly valuable and easily adapted to 


considering new indicators of disease where no definitive 
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eaemiical judgment@exists or doctors do met agree on ‘he 
melative importance of test results. 

The Bayes! diagnostic model (Section IV.B.) was 
developed to provide a systematic, quantitative procedure 
re aiding in the diagnosis Of CHDS at was evaluated by 
checking how well it diagnosed patients from a known CHD 
@ecup and a known healthy group. The difficulty in obtain- 
ie patients with all the required test results was noted 
in Section V and resulted in extremely small samples with 
complete data to investigate. However, six out of seven 
Gaethe CHD group were diagnosed correctly, and 33 out of 33 
of the no CHD group were diagnosed correctly. When only 
C€ight or more of the test resultsS were available, the model 
diagnosed with 91% accuracy (41 out of 50 in the CHD group 
Meme diagnosed correctly and 52 out of 52 of the no CHD 
maomp were diagnosed correctly). These results were based 
on uSing a posterior probability of disease of .50 as the 
G@me-off probability (i.e., P(D[S,,-..,S_) > .50 indicated 
Cob). The variation of the cut-off probability (see 
section VI) demonstrated that the diagnostic accuracy 
of the model was greatly influenced by the choice of the 
eut-off criterion. For example, using a cut-off of .20 
instead of .50 reduced the number of false negatives from 
nine to four while the number of false positives remained 
mile Same. 

As a validation of the Bayes' diagnostic model, 14 


known CHD patients from Walter Reed Hospital were diagnosed 


axe 





by the model. Twelve of the 14 were diagnosed correctly. 
fimeevalidation 1S not conclusive because yo te tiemc tencmc ls 
small sample tested, but it does indicate that the method 
1s promising. 

It may be desirable to use the methods presented in 
a screening program to identify people with high risk of 
CHD from a large population. Sufficient doctors may not 
be available to examine all of the people to be tested. As 
Gmecxample ot the models applicability to such a screening 
program (where a doctor is not required) diagnostic accuracy 
was investigated using the results of the information avail- 
able only [referred to in Figure 1 as Tr. and in Table 1 
=e s)|]. The model diagnosed with 92% accuracy (19 out oF 
geein the CHD group were diagnosed correctly and 44 out of 
wom the no CHD group were diagnosed correctly). 

The Bayesian diagnostic model had a high degree of 
Meecuracy in correct diagnoses. It 1s eas@ly implemented 
and appears to be well adapted to screening studies where 
a large population is involved. The model continuously 
updates the available patient information from which the 
conditional probabilities are calculated and may be useful 
ieeandicating trends or fluctuations in the indicators of 


disease. 
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VITI. AREAS FOR PURDUE St UD 


AS pointed out previously (see Section IV.A), one of 
the main advantages of the approach followed in the paper is 
the easy expansion of the number of variables and the number 
Of patients to be tested. This implies that as the number 
of variables is increased, the diagnosis of CHD will improve. 
Jie expanded list of variables could also be used fo pre- 
dict other diseases. Instead of a space of CHD and no 
wae there 1s a space of CHD plus othe w @1scasco emcee 
only by logical considerations such as the time, money, 
availability of computational equipment, etc. The integra- 
tion of this expanded prediction model into routine physical 
examinations and patient history could allow preliminary 
diagnosis prior to consultations with doctors, helping to 
jmeamee costs and the increasing patient load of doctors. 

As presently modeled, diagnosis is based on results of 
samples from diseased and non-diseased groups. However, 
as more samples are obtained and a history of the patient's 
variables (i.e., changes in blood pressure over several 
years) is made, the model could be modified to diagnose on 
the basis of change in a patient's variables rather than by 
comparison with a norm. This would improve diagnosis among 
persons suffering from one disease where the diagnosis is 
being complicated by the existence of another disease. 

the extension of the mode fon meimmadcmunc cna enos is sou 


Momen would require only a change in the prior probability 
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jo include a tést for Sex. wda@rti onal) yea sme stical 
check of the indicators would be necessary to determine if 
a new data base including women would be necessary if 
memen were to be tested: 

Once a person has been found to have CHD, a system to 
Memrtor his progress under dieting and exercise control 
could be developed from the present model. This could 
allow a technician rather than a doctor to periodically 
check the patient's indicators. 

The definitions used for positive tests throughout 
this study were based on current information. Both a 
Statistical and medical investigation in this area to 
mercer define test results could greatly improve future 
megels developed on the same principles. 

A model to predict the cost of implementing and 


operating the proposed diagnostic model should be explored 
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APPENDIX A: SAMPLE DATA COLLECTION SHEET 


Name 
Daxdxe 
RACE: CAU NEG MON 
Sex Height __ Blood Pressure 
Age Weight Blood Type 
Family History: Any of the following diagnosed heart diseases 
(circle) 
Father Umedke 
hocher Brotne a Unknown None 
Aunt Seesmic ts 


Any of the following died of heart disease (circle) 


Father Uncle 
Mother Brother Unknown None 
Aunt Sister 
Cigarette smoking in excess of one year? Yes No 


If yes: less than 1/2 pack per day 
one pack per day 
more than one pack per day 


Besotory of Ischemic episodes; 


enest pain, Undetermined son1ci 
Typical angina 
None 


Beoting EKG: 


Normal 

eu! abnormadicies 
Pathologic Q waves 
Other .; 


Exercise EKG: 


Neg 

ST depression greater than 1 mm 
eh depression greater than Zann 
of elevation 


weg lycerides 
mmolesterol 
Mex Heart Rate Attained during Exercise Test 
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APPENDIX B 
BAYES! DIAGNOSTIC MODEL, FORTRAN FLOW CHART 






Read in 
Titles 
Probabilities 
Patient "s Test 
Results 








O 
S 
Age No es 
Present? >(2) 
Yes 


Calculate New 
Probeow ty. 
of Disease 


(2)——__> Check for 11 
, more variables as 
above. Calculate 


y new probability of 
disease for each 


Print the 
Patient's 


Results 





Gy Stop 
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DIAGNOSTIC MODEL FORTRAN PROGRAM LISTING 
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